Self-supervised Learning


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Supervised Learning and Transfer Learning

Supervised pretraining on large labeled datasets has led to successful transfer learning

  • ImageNet

  • Pretrain for fine-grained image classification of 1000 classes



  • Use feature representations for downstream tasks, e.g., object detection, image segmentation, and action recognition



But supervised pretraining comes at a cost …

  • Time-consuming and expensive to label datasets for new tasks
  • Domain expertise needed for specialized tasks
    • Radiologists to label medical images
    • Native speakers or language specialists for labeling text in different languages
  • To relieve the burden of labelling,
    • Semi-supervised learning
    • Weakly-supervised learning
    • Unsupervised learning

Self-supervised learning

  • Self-supervised learning (SSL): supervise using labels generated from the data without any manual or weak label sources
    • Sub-class of unsupervised learning
  • Idea: Hide or modify part of the input. Ask model to recover input or classify what changed
    • Self-supervised task referred to as the pretext task can be formulated using only unlabeled data
    • The features obtained from pretext tasks are transferred to downstream tasks like classification, object detection, and segmentation



Pretext Tasks

  • Solving the pretext tasks allow the model to learn good features.

  • We can automatically generate labels for the pretext tasks.



2. Pretext Tasks

2.1. Pretext Task - Context Prediction

  • After creating 9 patches from one input image, the classifier is trained on the location information between the middle and other patches
  • A pair of middle patch and other patch is given as the input for the network
  • Method to avoid trivial solutions
    • uneven spacing between patches



Carl Doersch, Abhinav Gupta, Alexei A. Efros, 2015, "Unsupervised Visual Representation Learning by Context Prediction," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1422-1430.

2.2. Pretext Task - Jigsaw Puzzle

  • Generate 9 patches from the input image
  • After shuffling the patches, learn a classifier that predicts permutations to return to the original position





Noroozi, M., and Favaro, P., 2016, "Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles," Computer Vision – ECCV 2016, 69–84.

2.3. Pretext Task - Image Colorization

  • Given a grayscale photograph as input, image colorization attacks the problem of hallucinating a plausible color version of the photograph
  • Transfer the trained encoder to the downstream task



Zhang, R., Isola, P., and Efros, A. A., 2016, "Colorful Image Colorization," Computer Vision – ECCV 2016, 649–666.

  • Training data generation for self-supervised learning



  • Network architecture



2.4. Pretext Task - Image Super-resolution

  • What if we prepared training pairs of (small, upscaled) images by downsampling millions of images we have freely available?
  • Training data generation for self-supervised learning



  • Network architecture



2.5. Pretext Task - Image Inpainting

  • What if we prepared training pairs of (corrupted, fixed) images by randomly removing part of images?
  • Training data generation for self-supervised learning



  • Network architecture



3. Self-supervised Learning

Benefits of Self-supervised Learning

  • Like supervised pretraining, can learn general-purpose feature representations for downstream tasks
  • Reduce expense of hand-labeling large datasets
  • Can leverage nearly unlimited unlabeled data available on the web

Pipeline of Self-supervised Learning

  1. Within pretext tasks, deep neural network learns visual features of input unlabeled data
  1. The learned parameters of the network remain fixed and the trained network serves as a pre-trained model for downstream tasks
  1. The pre-trained model is transferred to downstream tasks and is fine-tuned
  1. The performance of downstream tasks is used to evaluate the methodology used in pretext tasks to learn features from unlabeled data



Jing, L., & Tian, Y., 2021, "Self-supervised visual feature learning with Deep Neural Networks: A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 4037–4058.

Downstream Tasks

  • After transferring the neural network pre-trained by the pretext task, freeze the weights and build additional layers for the downstream tasks

  • Wide variety of downstream tasks

    • Classification
    • Regression
    • Object detection
    • Segmentation



4. Self-supervised Learning with TensorFlow

Pretext Task - Rotation

  • RotNet
  • Hypothesis: a model could recognize the correct rotation of an object only if it has the “visual commonsense” of what the object should look like

    • Self-supervised learning by rotating the entire input images
    • The model learns to predict which rotation is applied (4-way classification)



  • RotNet: Supervised vs Self-supervised
    • The accuracy gap between the RotNet based model and the fully supervised Network-In-Network (NIN) model is very small, only 1.64% points
    • We do not need data labels to train the RotNet based model but achieved similar accuracy with that of the model which used data labels for training

Import Library

In [1]:
import tensorflow as tf
import numpy as np 
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
import matplotlib.pyplot as plt

Load MNIST Data

In [2]:
(X_train, Y_train), (X_test, Y_test) = keras.datasets.mnist.load_data()
X_train=X_train[:1000]
Y_train=Y_train[:1000]
X_test=X_test[:300]
Y_test=Y_test[:300]
In [3]:
print('shape of x_train:', X_train.shape)
print('shape of y_train:', Y_train.shape)
print('shape of x_test:', X_test.shape)
print('shape of y_test:',Y_test.shape)
shape of x_train: (1000, 28, 28)
shape of y_train: (1000,)
shape of x_test: (300, 28, 28)
shape of y_test: (300,)

4.1. Build RotNet for Pretext Task



Dataset for Pretext Task (Rotation)

  • Need to generate rotated images and their labels to train the model for pretext task
    • [1, 0, 0, 0]: 0$^\circ$ rotation
    • [0, 1, 0, 0]; 90$^\circ$ rotation
    • [0, 0, 1, 0]: 180$^\circ$ rotation
    • [0, 0, 0, 1]; 270$^\circ$ rotation
In [4]:
n_samples = X_train.shape[0]
X_rotate = np.zeros(shape = (n_samples*4, 
                             X_train.shape[1], 
                             X_train.shape[2]))
Y_rotate = np.zeros(shape = (n_samples*4, 4))

for i in range(n_samples):    
    img = X_train[i]
    X_rotate[4*i-4] = img
    Y_rotate[4*i-4] = tf.one_hot([0], depth = 4)
    
    # 90 degrees rotation
    X_rotate[4*i-3] = np.rot90(img, k = 1)
    Y_rotate[4*i-3] = tf.one_hot([1], depth = 4)
    
    # 180 degrees rotation
    X_rotate[4*i-2] = np.rot90(img, k = 2)
    Y_rotate[4*i-2] = tf.one_hot([2], depth = 4) 
    
    # 270 degrees rotation
    X_rotate[4*i-1] = np.rot90(img, k = 3)
    Y_rotate[4*i-1] = tf.one_hot([3], depth = 4)

Plot Dataset for Pretext Task (Rotation)

In [5]:
plt.subplots(figsize = (10, 10))

plt.subplot(141)
plt.imshow(X_rotate[12], cmap = 'gray')
plt.axis('off') 

plt.subplot(142)
plt.imshow(X_rotate[13], cmap = 'gray')
plt.axis('off') 

plt.subplot(143)
plt.imshow(X_rotate[14], cmap = 'gray')
plt.axis('off') 

plt.subplot(144)
plt.imshow(X_rotate[15], cmap = 'gray')
plt.axis('off') 
Out[5]:
(-0.5, 27.5, 27.5, -0.5)
In [6]:
X_rotate = X_rotate.reshape(-1,28,28,1)

Build Model for Pretext Task (Rotation)

In [7]:
model_pretext = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 64, 
                           kernel_size = (3,3), 
                           strides = (2,2), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),
    
    tf.keras.layers.MaxPool2D(pool_size = (2, 2), 
                              strides = (2, 2)),
    
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           strides = (1,1), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (7, 7, 64)),
    
    tf.keras.layers.MaxPool2D(pool_size = (2, 2), 
                              strides = (2, 2)),
    
    tf.keras.layers.Conv2D(filters = 16, 
                           kernel_size = (3,3),
                           strides = (2,2), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (3, 3, 32)),
    
    tf.keras.layers.Flatten(),
    
    tf.keras.layers.Dense(units = 4, activation = 'softmax')    
])

model_pretext.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 14, 14, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 7, 7, 64)          0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 7, 7, 32)          18464     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 3, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 2, 2, 16)          4624      
_________________________________________________________________
flatten (Flatten)            (None, 64)                0         
_________________________________________________________________
dense (Dense)                (None, 4)                 260       
=================================================================
Total params: 23,988
Trainable params: 23,988
Non-trainable params: 0
_________________________________________________________________
  • Training the model for the pretext task



In [8]:
model_pretext.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = 'accuracy')

model_pretext.fit(X_rotate, 
                  Y_rotate, 
                  batch_size = 32, 
                  epochs = 30,
                  verbose = 2, 
                  shuffle = False)
Epoch 1/30
125/125 - 1s - loss: 1.5175 - accuracy: 0.6812
Epoch 2/30
125/125 - 1s - loss: 0.3883 - accuracy: 0.8735
Epoch 3/30
125/125 - 1s - loss: 0.2628 - accuracy: 0.9095
Epoch 4/30
125/125 - 1s - loss: 0.2175 - accuracy: 0.9258
Epoch 5/30
125/125 - 1s - loss: 0.1658 - accuracy: 0.9377
Epoch 6/30
125/125 - 1s - loss: 0.0997 - accuracy: 0.9640
Epoch 7/30
125/125 - 1s - loss: 0.0805 - accuracy: 0.9705
Epoch 8/30
125/125 - 1s - loss: 0.0692 - accuracy: 0.9712
Epoch 9/30
125/125 - 1s - loss: 0.0806 - accuracy: 0.9697
Epoch 10/30
125/125 - 1s - loss: 0.1045 - accuracy: 0.9613
Epoch 11/30
125/125 - 1s - loss: 0.0675 - accuracy: 0.9765
Epoch 12/30
125/125 - 1s - loss: 0.0523 - accuracy: 0.9803
Epoch 13/30
125/125 - 1s - loss: 0.0662 - accuracy: 0.9747
Epoch 14/30
125/125 - 1s - loss: 0.0342 - accuracy: 0.9872
Epoch 15/30
125/125 - 1s - loss: 0.0456 - accuracy: 0.9818
Epoch 16/30
125/125 - 1s - loss: 0.0324 - accuracy: 0.9872
Epoch 17/30
125/125 - 1s - loss: 0.0247 - accuracy: 0.9908
Epoch 18/30
125/125 - 1s - loss: 0.0801 - accuracy: 0.9705
Epoch 19/30
125/125 - 1s - loss: 0.0401 - accuracy: 0.9840
Epoch 20/30
125/125 - 1s - loss: 0.0593 - accuracy: 0.9790
Epoch 21/30
125/125 - 1s - loss: 0.0136 - accuracy: 0.9950
Epoch 22/30
125/125 - 1s - loss: 0.0365 - accuracy: 0.9902
Epoch 23/30
125/125 - 1s - loss: 0.0723 - accuracy: 0.9750
Epoch 24/30
125/125 - 1s - loss: 0.0306 - accuracy: 0.9900
Epoch 25/30
125/125 - 1s - loss: 0.0296 - accuracy: 0.9905
Epoch 26/30
125/125 - 1s - loss: 0.0496 - accuracy: 0.9852
Epoch 27/30
125/125 - 1s - loss: 0.0304 - accuracy: 0.9885
Epoch 28/30
125/125 - 1s - loss: 0.0255 - accuracy: 0.9915
Epoch 29/30
125/125 - 1s - loss: 0.0098 - accuracy: 0.9952
Epoch 30/30
125/125 - 1s - loss: 0.0581 - accuracy: 0.9833
Out[8]:
<tensorflow.python.keras.callbacks.History at 0x1a8ba933160>

4.2. Build Downstream Task (MNIST Image Classification)

  • Freezing trained parameters to transfer them for the downstream task
In [9]:
model_pretext.trainable = False

Reshape Dataset

In [10]:
X_train = X_train.reshape(-1,28,28,1)
X_test = X_test.reshape(-1,28,28,1)
Y_train = tf.one_hot(Y_train, 10,on_value = 1.0, off_value = 0.0)
Y_test = tf.one_hot(Y_test, 10,on_value = 1.0, off_value = 0.0)

Build Model

  • Model: two convolution layers and one fully connected layer
    • Two convolution layers are transferred from the model for the pretext task
    • Single fully connected layer is trained only



In [11]:
model_downstream = tf.keras.models.Sequential([ 
    model_pretext.get_layer(index = 0),    
    model_pretext.get_layer(index = 1),
    model_pretext.get_layer(index = 2),
    model_pretext.get_layer(index = 3), 
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

model_downstream.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 14, 14, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 7, 7, 64)          0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 7, 7, 32)          18464     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 3, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 288)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                2890      
=================================================================
Total params: 21,994
Trainable params: 2,890
Non-trainable params: 19,104
_________________________________________________________________
In [12]:
model_downstream.compile(optimizer = 'adam',
                         loss = 'categorical_crossentropy',
                         metrics = 'accuracy')

model_downstream.fit(X_train, Y_train, batch_size = 64, validation_split = 0.2, epochs = 50, verbose = 2)
Epoch 1/50
13/13 - 0s - loss: 25.5906 - accuracy: 0.1000 - val_loss: 14.9224 - val_accuracy: 0.1250
Epoch 2/50
13/13 - 0s - loss: 13.1454 - accuracy: 0.1625 - val_loss: 11.0325 - val_accuracy: 0.1850
Epoch 3/50
13/13 - 0s - loss: 8.7610 - accuracy: 0.2750 - val_loss: 7.3739 - val_accuracy: 0.3150
Epoch 4/50
13/13 - 0s - loss: 6.0769 - accuracy: 0.4100 - val_loss: 5.8050 - val_accuracy: 0.4400
Epoch 5/50
13/13 - 0s - loss: 4.6820 - accuracy: 0.4988 - val_loss: 5.0866 - val_accuracy: 0.4900
Epoch 6/50
13/13 - 0s - loss: 3.7179 - accuracy: 0.5825 - val_loss: 4.4287 - val_accuracy: 0.5600
Epoch 7/50
13/13 - 0s - loss: 3.0344 - accuracy: 0.6325 - val_loss: 4.0820 - val_accuracy: 0.5900
Epoch 8/50
13/13 - 0s - loss: 2.5707 - accuracy: 0.6675 - val_loss: 3.6709 - val_accuracy: 0.6200
Epoch 9/50
13/13 - 0s - loss: 2.2272 - accuracy: 0.7175 - val_loss: 3.5394 - val_accuracy: 0.6250
Epoch 10/50
13/13 - 0s - loss: 1.9458 - accuracy: 0.7362 - val_loss: 3.2625 - val_accuracy: 0.6400
Epoch 11/50
13/13 - 0s - loss: 1.7318 - accuracy: 0.7650 - val_loss: 3.0871 - val_accuracy: 0.6650
Epoch 12/50
13/13 - 0s - loss: 1.5282 - accuracy: 0.8000 - val_loss: 2.9412 - val_accuracy: 0.6550
Epoch 13/50
13/13 - 0s - loss: 1.3787 - accuracy: 0.8062 - val_loss: 2.7774 - val_accuracy: 0.6850
Epoch 14/50
13/13 - 0s - loss: 1.2190 - accuracy: 0.8175 - val_loss: 2.5989 - val_accuracy: 0.6850
Epoch 15/50
13/13 - 0s - loss: 1.1399 - accuracy: 0.8462 - val_loss: 2.5410 - val_accuracy: 0.7050
Epoch 16/50
13/13 - 0s - loss: 1.0576 - accuracy: 0.8438 - val_loss: 2.3818 - val_accuracy: 0.7050
Epoch 17/50
13/13 - 0s - loss: 0.9242 - accuracy: 0.8550 - val_loss: 2.2165 - val_accuracy: 0.7150
Epoch 18/50
13/13 - 0s - loss: 0.8928 - accuracy: 0.8612 - val_loss: 2.1766 - val_accuracy: 0.6950
Epoch 19/50
13/13 - 0s - loss: 0.7750 - accuracy: 0.8775 - val_loss: 2.1327 - val_accuracy: 0.7050
Epoch 20/50
13/13 - 0s - loss: 0.6905 - accuracy: 0.8825 - val_loss: 2.0182 - val_accuracy: 0.7300
Epoch 21/50
13/13 - 0s - loss: 0.6100 - accuracy: 0.8975 - val_loss: 1.9755 - val_accuracy: 0.7150
Epoch 22/50
13/13 - 0s - loss: 0.5780 - accuracy: 0.9025 - val_loss: 1.8801 - val_accuracy: 0.7450
Epoch 23/50
13/13 - 0s - loss: 0.5530 - accuracy: 0.8938 - val_loss: 1.8851 - val_accuracy: 0.7450
Epoch 24/50
13/13 - 0s - loss: 0.5169 - accuracy: 0.9000 - val_loss: 1.8776 - val_accuracy: 0.7250
Epoch 25/50
13/13 - 0s - loss: 0.4456 - accuracy: 0.9187 - val_loss: 1.7259 - val_accuracy: 0.7500
Epoch 26/50
13/13 - 0s - loss: 0.4160 - accuracy: 0.9250 - val_loss: 1.7258 - val_accuracy: 0.7300
Epoch 27/50
13/13 - 0s - loss: 0.3695 - accuracy: 0.9312 - val_loss: 1.6814 - val_accuracy: 0.7600
Epoch 28/50
13/13 - 0s - loss: 0.3376 - accuracy: 0.9400 - val_loss: 1.6666 - val_accuracy: 0.7300
Epoch 29/50
13/13 - 0s - loss: 0.3307 - accuracy: 0.9350 - val_loss: 1.6489 - val_accuracy: 0.7500
Epoch 30/50
13/13 - 0s - loss: 0.3037 - accuracy: 0.9425 - val_loss: 1.6382 - val_accuracy: 0.7400
Epoch 31/50
13/13 - 0s - loss: 0.2819 - accuracy: 0.9488 - val_loss: 1.5674 - val_accuracy: 0.7500
Epoch 32/50
13/13 - 0s - loss: 0.2371 - accuracy: 0.9575 - val_loss: 1.5481 - val_accuracy: 0.7550
Epoch 33/50
13/13 - 0s - loss: 0.2198 - accuracy: 0.9625 - val_loss: 1.5185 - val_accuracy: 0.7650
Epoch 34/50
13/13 - 0s - loss: 0.2000 - accuracy: 0.9625 - val_loss: 1.5104 - val_accuracy: 0.7700
Epoch 35/50
13/13 - 0s - loss: 0.1856 - accuracy: 0.9712 - val_loss: 1.4932 - val_accuracy: 0.7600
Epoch 36/50
13/13 - 0s - loss: 0.1650 - accuracy: 0.9750 - val_loss: 1.4575 - val_accuracy: 0.7600
Epoch 37/50
13/13 - 0s - loss: 0.1523 - accuracy: 0.9712 - val_loss: 1.4905 - val_accuracy: 0.7650
Epoch 38/50
13/13 - 0s - loss: 0.1477 - accuracy: 0.9750 - val_loss: 1.4900 - val_accuracy: 0.7450
Epoch 39/50
13/13 - 0s - loss: 0.1600 - accuracy: 0.9688 - val_loss: 1.4587 - val_accuracy: 0.7500
Epoch 40/50
13/13 - 0s - loss: 0.1299 - accuracy: 0.9750 - val_loss: 1.4632 - val_accuracy: 0.7750
Epoch 41/50
13/13 - 0s - loss: 0.1268 - accuracy: 0.9725 - val_loss: 1.3950 - val_accuracy: 0.7700
Epoch 42/50
13/13 - 0s - loss: 0.1076 - accuracy: 0.9787 - val_loss: 1.4392 - val_accuracy: 0.7550
Epoch 43/50
13/13 - 0s - loss: 0.1061 - accuracy: 0.9812 - val_loss: 1.3762 - val_accuracy: 0.7900
Epoch 44/50
13/13 - 0s - loss: 0.0878 - accuracy: 0.9887 - val_loss: 1.3690 - val_accuracy: 0.7750
Epoch 45/50
13/13 - 0s - loss: 0.0812 - accuracy: 0.9875 - val_loss: 1.3862 - val_accuracy: 0.7900
Epoch 46/50
13/13 - 0s - loss: 0.0709 - accuracy: 0.9887 - val_loss: 1.3481 - val_accuracy: 0.7800
Epoch 47/50
13/13 - 0s - loss: 0.0624 - accuracy: 0.9912 - val_loss: 1.3344 - val_accuracy: 0.7750
Epoch 48/50
13/13 - 0s - loss: 0.0586 - accuracy: 0.9912 - val_loss: 1.3593 - val_accuracy: 0.7650
Epoch 49/50
13/13 - 0s - loss: 0.0562 - accuracy: 0.9950 - val_loss: 1.3463 - val_accuracy: 0.7750
Epoch 50/50
13/13 - 0s - loss: 0.0518 - accuracy: 0.9912 - val_loss: 1.3221 - val_accuracy: 0.7850
Out[12]:
<tensorflow.python.keras.callbacks.History at 0x1a8bc7779b0>

Downstream Task Trained Result (Image Classification Result)

In [13]:
name = ['0', '1', '2', '3', '4', '5','6', '7', '8', '9']
idx = 9
img = X_train[idx].reshape(-1,28,28,1)
label = Y_train[idx]
predict = model_downstream.predict(img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))
plt.subplot(1,2,1)
plt.imshow(img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(name[mypred[0]]))
Prediction : 4

4.3. Build Supervised Model for Comparison

  • Convolution Neural Networks for MNIST image classification
    • Model: Same model architecture with the model for the downstream task
    • The number of total parameter is the same with the model for the downstream task, but is has zero non-trainable parameters
In [14]:
model_sup = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 64, 
                           kernel_size = (3,3), 
                           strides = (2,2), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),
    
    tf.keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2)),
    
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           strides = (1,1), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (7, 7, 64)),
    
    tf.keras.layers.MaxPool2D(pool_size = (2, 2), strides = (2, 2)),
    
    tf.keras.layers.Flatten(),
    
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

model_sup.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 14, 14, 64)        640       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 7, 7, 32)          18464     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 3, 3, 32)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 288)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2890      
=================================================================
Total params: 21,994
Trainable params: 21,994
Non-trainable params: 0
_________________________________________________________________
In [15]:
model_sup.compile(optimizer = 'adam',
                  loss = 'categorical_crossentropy',
                  metrics = 'accuracy')

model_sup.fit(X_train, 
              Y_train, 
              batch_size = 64, 
              validation_split = 0.2, 
              epochs = 50,
              verbose = 2)
Epoch 1/50
13/13 - 0s - loss: 13.1135 - accuracy: 0.1625 - val_loss: 4.6872 - val_accuracy: 0.3550
Epoch 2/50
13/13 - 0s - loss: 2.9685 - accuracy: 0.5100 - val_loss: 2.6768 - val_accuracy: 0.5650
Epoch 3/50
13/13 - 0s - loss: 1.4531 - accuracy: 0.7163 - val_loss: 1.4535 - val_accuracy: 0.7050
Epoch 4/50
13/13 - 0s - loss: 0.8610 - accuracy: 0.8050 - val_loss: 1.2013 - val_accuracy: 0.7300
Epoch 5/50
13/13 - 0s - loss: 0.4696 - accuracy: 0.8888 - val_loss: 0.9931 - val_accuracy: 0.7900
Epoch 6/50
13/13 - 0s - loss: 0.3089 - accuracy: 0.9225 - val_loss: 0.8995 - val_accuracy: 0.7850
Epoch 7/50
13/13 - 0s - loss: 0.1789 - accuracy: 0.9575 - val_loss: 0.8386 - val_accuracy: 0.8150
Epoch 8/50
13/13 - 0s - loss: 0.1131 - accuracy: 0.9712 - val_loss: 0.7900 - val_accuracy: 0.8200
Epoch 9/50
13/13 - 0s - loss: 0.0675 - accuracy: 0.9825 - val_loss: 0.7945 - val_accuracy: 0.8300
Epoch 10/50
13/13 - 0s - loss: 0.0453 - accuracy: 0.9912 - val_loss: 0.7918 - val_accuracy: 0.8250
Epoch 11/50
13/13 - 0s - loss: 0.0213 - accuracy: 0.9987 - val_loss: 0.7587 - val_accuracy: 0.8200
Epoch 12/50
13/13 - 0s - loss: 0.0131 - accuracy: 1.0000 - val_loss: 0.7566 - val_accuracy: 0.8200
Epoch 13/50
13/13 - 0s - loss: 0.0099 - accuracy: 1.0000 - val_loss: 0.7372 - val_accuracy: 0.8250
Epoch 14/50
13/13 - 0s - loss: 0.0075 - accuracy: 1.0000 - val_loss: 0.7440 - val_accuracy: 0.8200
Epoch 15/50
13/13 - 0s - loss: 0.0066 - accuracy: 1.0000 - val_loss: 0.7484 - val_accuracy: 0.8300
Epoch 16/50
13/13 - 0s - loss: 0.0059 - accuracy: 1.0000 - val_loss: 0.7439 - val_accuracy: 0.8250
Epoch 17/50
13/13 - 0s - loss: 0.0051 - accuracy: 1.0000 - val_loss: 0.7464 - val_accuracy: 0.8300
Epoch 18/50
13/13 - 0s - loss: 0.0047 - accuracy: 1.0000 - val_loss: 0.7444 - val_accuracy: 0.8400
Epoch 19/50
13/13 - 0s - loss: 0.0043 - accuracy: 1.0000 - val_loss: 0.7422 - val_accuracy: 0.8350
Epoch 20/50
13/13 - 0s - loss: 0.0040 - accuracy: 1.0000 - val_loss: 0.7401 - val_accuracy: 0.8400
Epoch 21/50
13/13 - 0s - loss: 0.0037 - accuracy: 1.0000 - val_loss: 0.7370 - val_accuracy: 0.8400
Epoch 22/50
13/13 - 0s - loss: 0.0034 - accuracy: 1.0000 - val_loss: 0.7380 - val_accuracy: 0.8400
Epoch 23/50
13/13 - 0s - loss: 0.0032 - accuracy: 1.0000 - val_loss: 0.7378 - val_accuracy: 0.8400
Epoch 24/50
13/13 - 0s - loss: 0.0030 - accuracy: 1.0000 - val_loss: 0.7338 - val_accuracy: 0.8400
Epoch 25/50
13/13 - 0s - loss: 0.0028 - accuracy: 1.0000 - val_loss: 0.7313 - val_accuracy: 0.8400
Epoch 26/50
13/13 - 0s - loss: 0.0027 - accuracy: 1.0000 - val_loss: 0.7331 - val_accuracy: 0.8400
Epoch 27/50
13/13 - 0s - loss: 0.0025 - accuracy: 1.0000 - val_loss: 0.7335 - val_accuracy: 0.8400
Epoch 28/50
13/13 - 0s - loss: 0.0024 - accuracy: 1.0000 - val_loss: 0.7324 - val_accuracy: 0.8400
Epoch 29/50
13/13 - 0s - loss: 0.0023 - accuracy: 1.0000 - val_loss: 0.7341 - val_accuracy: 0.8450
Epoch 30/50
13/13 - 0s - loss: 0.0022 - accuracy: 1.0000 - val_loss: 0.7297 - val_accuracy: 0.8450
Epoch 31/50
13/13 - 0s - loss: 0.0021 - accuracy: 1.0000 - val_loss: 0.7300 - val_accuracy: 0.8500
Epoch 32/50
13/13 - 0s - loss: 0.0020 - accuracy: 1.0000 - val_loss: 0.7311 - val_accuracy: 0.8450
Epoch 33/50
13/13 - 0s - loss: 0.0019 - accuracy: 1.0000 - val_loss: 0.7290 - val_accuracy: 0.8450
Epoch 34/50
13/13 - 0s - loss: 0.0018 - accuracy: 1.0000 - val_loss: 0.7263 - val_accuracy: 0.8500
Epoch 35/50
13/13 - 0s - loss: 0.0017 - accuracy: 1.0000 - val_loss: 0.7274 - val_accuracy: 0.8500
Epoch 36/50
13/13 - 0s - loss: 0.0016 - accuracy: 1.0000 - val_loss: 0.7245 - val_accuracy: 0.8500
Epoch 37/50
13/13 - 0s - loss: 0.0016 - accuracy: 1.0000 - val_loss: 0.7246 - val_accuracy: 0.8550
Epoch 38/50
13/13 - 0s - loss: 0.0015 - accuracy: 1.0000 - val_loss: 0.7258 - val_accuracy: 0.8500
Epoch 39/50
13/13 - 0s - loss: 0.0015 - accuracy: 1.0000 - val_loss: 0.7256 - val_accuracy: 0.8550
Epoch 40/50
13/13 - 0s - loss: 0.0014 - accuracy: 1.0000 - val_loss: 0.7236 - val_accuracy: 0.8550
Epoch 41/50
13/13 - 0s - loss: 0.0013 - accuracy: 1.0000 - val_loss: 0.7241 - val_accuracy: 0.8600
Epoch 42/50
13/13 - 0s - loss: 0.0013 - accuracy: 1.0000 - val_loss: 0.7237 - val_accuracy: 0.8550
Epoch 43/50
13/13 - 0s - loss: 0.0013 - accuracy: 1.0000 - val_loss: 0.7214 - val_accuracy: 0.8600
Epoch 44/50
13/13 - 0s - loss: 0.0012 - accuracy: 1.0000 - val_loss: 0.7240 - val_accuracy: 0.8550
Epoch 45/50
13/13 - 0s - loss: 0.0012 - accuracy: 1.0000 - val_loss: 0.7226 - val_accuracy: 0.8550
Epoch 46/50
13/13 - 0s - loss: 0.0011 - accuracy: 1.0000 - val_loss: 0.7208 - val_accuracy: 0.8600
Epoch 47/50
13/13 - 0s - loss: 0.0011 - accuracy: 1.0000 - val_loss: 0.7234 - val_accuracy: 0.8600
Epoch 48/50
13/13 - 0s - loss: 0.0011 - accuracy: 1.0000 - val_loss: 0.7229 - val_accuracy: 0.8600
Epoch 49/50
13/13 - 0s - loss: 0.0010 - accuracy: 1.0000 - val_loss: 0.7256 - val_accuracy: 0.8500
Epoch 50/50
13/13 - 0s - loss: 9.9502e-04 - accuracy: 1.0000 - val_loss: 0.7244 - val_accuracy: 0.8500
Out[15]:
<tensorflow.python.keras.callbacks.History at 0x1a8ba8db5f8>

Compare Self-supervised Learning and Supervised Learning

In [16]:
test_self = model_downstream.evaluate(X_test,Y_test,batch_size = 64,verbose = 2)

print("")
print('Self-supervised Learning Accuracy on Test Data:  {:.2f}%'.format(test_self[1]*100))
5/5 - 0s - loss: 1.1017 - accuracy: 0.8600

Self-supervised Learning Accuracy on Test Data:  86.00%
In [17]:
test_sup = model_sup.evaluate(X_test,Y_test,batch_size = 64,verbose = 2)

print("")
print('Supervised Learning Accuracy on Test Data:  {:.2f}%'.format(test_sup[1]*100))
5/5 - 0s - loss: 0.7423 - accuracy: 0.8300

Supervised Learning Accuracy on Test Data:  83.00%
In [2]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')